AMENDED TITLE

Please amend the Title of the Invention as follows: "A <u>SYSTEM AND</u> METHOD TO IMPROVE THE EFFICIENCY OF SYNCHRONOUS MIRROR DELAYS AND DELAY LOCKED LOOPS."

TECHNOLOGY CENTER 2800



# On page 2, please amend the Paragraph beginning at line 16 as follows:

For the conventional SMD implementations, two delay lines are required, one for delay measurement, one for variable mirrored delay. The effective delay length for both delay lines is defined as:

$$t_{delay} = t_{ck} - t_{mdl}$$

where  $\underline{tek} \ \underline{t_{ck}}$  is the clock period,  $\underline{tmdl} \ \underline{t_{mdl}}$  is the delay of  $\underline{I/O} \ \underline{an \ input/output \ ("I/O")} \ model$ , including clock input buffer, receiver, clock tree and driver logic. The delay stages required for each delay line is given by:

$$N = \frac{t_{delay}}{t_d} = \frac{t_{ck} - t_{mdl}}{t_d}$$

where  $t_d$  is the delay per stage. The worst case number is given by:

$$N_{worst} = \frac{t_{ck} (long) - t_{mdl} (fast)}{t_{d} (fast)}$$





# On pages 5-14, please replace the Detailed Description of the Preferred Embodiment with the following:

Referring now to Fig. 1, a system in accordance with the present invention is shown generally by the numeral 10. The system 10 includes a synchronous mirror delay (SMD) circuit 12 and a phase detector control block 14. An external clock signal 16 is input into receiver and buffer 18. This produces clock input signal 20 (CIN), inverted clock input signal (CIN') 21 and clock delay signal 22 (CDLY). Clock delay signal 22 is delayed by an  $\frac{1}{10}$  system delay  $t_{mdl}$  illustrated by block 24. CDLY 22 is also directly fed via line 23 into the SMD 12.



Phase detector control block 14 includes phase detector 26 and associated logical circuitry. The goal of the present invention is to take clock input signal 20 and clock delay signal 22 and, by defining certain characteristics and relationships about the timing of the signals, delineate specific conditions under which the circuit is operating, and direct the signal accordingly. Ultimately, the phase of the signals will determine whether CIN 20 or CIN' 21 is used as the input to the SMD, or whether the SMD is bypassed altogether. Although a specific logic arrangement is shown, it is contemplated that any suitable control logic may be used to define the conditions of the signals and select them accordingly. Associated with the phase detector is a multiplexor 28 which is used as an input selection multiplexor, that is to determine which selection input (CIN or CIN'), based on the difference between CIN signal 20 and CDLY signal 22, to send to the SMD 12. The outputs (collectively 32) of phase detector 26, which will be described in further detail with respect to Fig. 2. Fig. 2 is fed into circuitry control block 30. Circuitry block 30 may be, for instance, a decoder, although any suitable logic is contemplated. The outputs 38 and 40 of phase detection circuitry block 30 will be used to select the outputs for multiplexors 28 and 46, respectively. Based on the signal 38 from control circuitry block 30, input multiplexor 28 will select either CIN 20 or CIN' 21 to be placed on line 48. The output multiplexor 46 is used in combination with the control circuitry block 30 to select which signal is to be put on output line 50. Line 48 (either CIN signal 20 or CIN' signal 21) is directed into the SMD 12. Line 48 is also directed via connection 34 to an input of output selection multiplexor 46. As is known in the art, the SMD 12 includes a measurement delay line composed of a plurality of serially cascaded delay elements (not shown), the measurement delay line having a measurement delay line input and a measurement delay line output. Each delay





stage is a delay element with control gates. An output of the measurement delay line is used as the input to a variable delay line. The variable delay line is also a plurality of serially connected delay elements (not shown), the variable delay line having a variable delay line input and a variable delay line output. The output of the variable delay line of the SMD 12 is output signal SMDOUT 44. Output signal SMDOUT 44 is used as the input to output multiplexor 46. In some circumstances, it is desired to entirely bypass SMD 12, and in such a case, control circuitry block 30 will send a control signal 40 selecting line 34 rather than SMDOUT 44 as the output 50 of output selection multiplexor 46. As a result, line 48 (either CIN signal 20 or CIN' signal 21) will be used as the input for output selection multiplexor 46. In other cases, the control circuitry block 30 will send a control signal 40 selecting signal SMDOUT 44 from SMD 12. Having selected one of the signals 34 or signal SMDOUT 44, output signal 50 is used as the input to clock tree 54. As is known, a clock tree is a circuit used for distributing a local clock signal. A clock tree may include an internal buffer in order to amplify, buffer and delay the signal in order to form internal clock signal CLKIN 56. Although not shown, it is contemplated that an inverter may be placed before the clock tree 54 in order to invert the clock signal if desired. In this manner, internal clock signal CLKIN 56 will be matched to the external clock 16.

Referring now to Fig. 2, phase detector 26 is described in more detail. Phase detector 26 receives clock input signal 20 and clock delay signal 22. Clock delay signal 22 is used as clock inputs 58 and 60 into registers 62 and 64 and 62 respectively. Although D flip-flops are used as registers 62 and 64, it is contemplated that any suitable logic device suitable for the application may be employed. Signal 22 is input into the clock inputs for the D flip-flops. Clock input signal CIN 20 is input as the D inputs 66 and 68 of flip-flops 64 and 62 and 64, respectively. Input 68 is delayed from clock input signal 20 by t<sub>d</sub> 70, which is representative of the delay per stage, and therefore there is a delay between input signals 66 and 68, by a period t<sub>d</sub> 70. Each flip-flop 62 and 64 respectively outputs a signal 32 and 34 and 32. The logical level, i.e., a logical 1 or a logical 0, of signal branches 32 and 34 determine the condition under which the relationship of the CIN signal 20 and CDLY signal 22 are operating in. The signal conditions are based on their individual timing characteristics.

Referring now to Fig. 3, a clock diagram is shown illustrating one possible combination of timing characteristics of CIN signal 20 and CDLY signal 22. CIN signal 20 fires first, and the



characteristic  $t_{mdl}$ , which is the delay of the  $\overline{IO}$  model, is measured from the rising edge 23A to the rising edge 25A of CDLY signal 22. The entire period of CIN signal 20, that is the measurement of rising edge 23A to the next rising edge 23B is defined as the clock period or  $t_{ck}$ . Therefore, the time defined from the rising edge 25A of CDLY signal 22 to the next rising edge 23B of the CIN signal 20 defines a delay,  $t_{delay}$  27A, which may be defined by  $t_{ck}$  minus  $t_{mdl}$ . This series of timing characteristics would occur when CDLY signal 22 fires after the first falling edge 29A of CIN signal 20. This sampling of CIN from rising edge to rising edge requires a given number of delay stages to accomplish, where the total delay of these delay stages is  $t_{delay}$ , which is less than half of  $t_{ck}$ .



Referring now to Fig. 4, an alternate timing diagram is shown for CIN signal 20 and CDLY signal 22. These timing characteristics would occur when the rising edge 25B of CDLY signal 22 occurred prior to the falling edge 29B of CIN signal 20. Again, the delay between the firing at the rising edge 23B of CIN signal 20 and rising edge 25B defines the period of delay for the  $\frac{1}{2}$  model  $t_{mdl}$ . Because the period of time from rising edge 23B to falling edge 29B represents half of the clock period  $t_{ck}$ , that portion of the signal may be represented by  $t_{ck}/2$ . Therefore, that distance minus the delay period for the  $\frac{1}{2}$  model  $t_{mdl}$  results in the delay period 27B, in this case defined as  $t_{ck}/2$  minus  $t_{mdl}$ . Therefore, if the phase detector analyzes when the rising edge of CDLY signal 22 occurs with respect to the falling edge of CIN signal 20, a distinction can be made with respect to the timing characteristics of the individual signals 20 and 22. Since the total delay required from the SMD for synchronization is reduced from  $(t_{ck}$  minus  $t_{mdl}$ ) to  $(t_{ck}/2$  minus  $t_{mdl}$ ) where  $t_{mdl}$  is less than  $t_{ck}/2$ , more than half of the delay stages can be saved with this invention. The present invention takes advantage of the ability to sample from a rising edge 23b to falling edge 29b, resulting in fewer delay stages in the SMD to accomplish.

Referring now to Fig. 4a, the timing diagram is shown illustrating the lock conditions. CIN signal is shown as well as CIN plus t<sub>d</sub>, where t<sub>d</sub> represents the delay between the two signals. In lock condition 3, signal CDLY is shown rising between the rising of CIN and CIN plus t<sub>d</sub>, and falling between the falling of CIN and CIN plus t<sub>d</sub>, respectively. Under this circumstance, a lock condition exists and the synchronous mirror delay is bypassed. Under lock condition 4, CDLY signal rises between the falling edge of CIN and the falling edge of CIN plus t<sub>d</sub>. And CDLY falls between the rising edge of CIN and the rising edge of CIN plus t<sub>d</sub>. Again, a lock condition exists and again the synchronous mirror delay is bypassed.

USSN 09/921,614 PATENT RESPONSE

Referring now to Fig. 5, the four possible combinations of the logical levels of PH1 signal 32 and PH2 signal 34 are illustrated. Based on the logical levels of each of these signals, such that the condition of the signals may be determined from the logic levels on these lines.

## Condition (1):

$$t_{mdl} > t_{ck}/2$$

For condition (1), the effective delay length in the SMD is equal to  $t_{ck} - t_{mdl}$ . When locking,  $t_{lock} = d_{in} + t_{mdl} + (t_{ck} - t_{mdl})$  (measured)  $+ (t_{ck} - t_{mdl})$  (variable)  $+ d_{out} = 2t_{ck} + d_{in} + d_{out} - t_{mdl} \approx 2t_{ck}$ , where  $d_{in}$  and  $d_{out}$  are  $\overline{IO}$  intrinsic delays on which  $t_{mdl}$  is represented or modeled.

This is the conventional equation to calculate the lock time of the SMD, which is two clock cycles.

## Condition (2):

$$t_{mdl} < t_{ck}/2$$

Under this condition, a  $\frac{\text{mux multiplexor}}{\text{multiplexor}}$  is used to select a different phase of CIN to feed in the SMD and the effective delay length is equal to  $t_{ck}/2 - t_{mdl}$ .

Again, 
$$t_{lock} = d_{in} + t_{mdl} + (t_{ck}/2 - t_{mdl}) + (t_{ck}/2 - t_{mdl}) + d_{out} = t_{ck} + d_{in} + d_{out} - t_{mdl} \approx t_{ck}$$
.

The lock time is decreased to only one clock cycle. From the previous example,

$$N_{\text{worst}} = \frac{15\text{ns}/2 - 1\text{ns}}{110\text{ps}} = 59 \text{ stages}$$

compared to 128 stages without the invention.

#### Condition (3):

When  $t_{mdl} = t_{ck}$ , the <u>PD phase detector</u> would declare a lock condition and the clock signal CIN is output directly without even passing into the SMD. The SMD may be disabled to save power.

#### Condition (4):



USSN 09/921,614 PATENT RESPONSE

When  $t_{mdl} = t_{ck}/2$ , the CIN is inverted and the SMD may be disabled to save power.

It is contemplated that the present invention will reduce the effective delay elements used in the SMD, as a function of the signals being found under the condition 2, saving both silicon area and power in the memory device, which is the primary goal.

For conditions (2) and (4), if there is a severe duty cycle distortion, the falling edges of CIN cannot provide a correct reference to adjust the delay, which would result in a large skew (phase error) at the output.

Referring now to Fig. 6, a flowchart illustrating a methodology associated with the present invention is disclosed. At the start 70, the present invention is used for those circuits in which it is desired to reduce the number of delay stages and there is negligible duty-cycle distortion. Therefore, signal CIN, inverted CIN and CDLY are provided in step 72. CDLY is delayed by the delay of the HO I/O system. In step 74, a phase detector is interposed between the synchronous mirror delay and CIN and CDLY signals. Both CIN and CDLY are input into the phase detector 76, after which it is necessary to determine based on the timing characteristics and relationships of CIN to CDLY, which condition or phase the timing signals are in 78. This leads to a series of four decisions 80A through 80D 80a through 80d used to determine the relationship of the particular timing characteristics t<sub>mdl</sub> versus t<sub>ck</sub>. Although the series of decisions are shown made in a serial fashion, that is, 80A 80a prior to 80B 80b and so on, these operations could also be rearranged to run in other serial fashions or in parallel, so long as the determinations are made. In decision 80A 80a, it is determined whether  $t_{mdl}$  is greater than  $t_{ck}/2$  but less than  $t_{ck}$ . If so, 82A 82a condition 1 is triggered 84A 84a in which the lock time is equal to two clock cycles, which is the conventional synchronous mirror delay lock time. In a conventional manner, CIN is then fed into the synchronous mirror delay. The SMDOUT signal is input into the clock tree. If condition 1 is not satisfied 81, it is determined whether  $t_{mdl}$  is less than  $t_{ck}/2$  in decision 80B 80b. If so \$2B 82b, condition 2 is implicated in which the lock time is equal to approximately one clock cycle, or approximately half of the conventional synchronous mirror delay lock time. CIN is then inverted and fed into the synchronous mirror delay. The SMDOUT signal is input into the clock tree. If condition 2 is not satisfied 83, it is determined whether t<sub>mdl</sub> is equal to t<sub>ck</sub> in decision 80C 80c. If so 82C 82c, condition 3 84C 84c is implicated, and lock has already occurred so a lock is declared and the synchronous mirror delay is bypassed. The CIN signal is input directly into the clock tree for internal production of the clock. If none of these conditions



USSN 09/921,614 PATENT RESPONSE

are true 85 and decision 80D 80d is determined whether t<sub>mdl</sub> is equal to t<sub>ck</sub>/2. If so 82D 82d, condition 4 84D 84d is implicated and it is merely necessary to invert the CIN signal or use an inverted CIN to be input into the clock tree. Again, since there is no need to further delay, the synchronous mirror delay is bypassed and, in a preferred embodiment may be disabled in order to save power. The CIN' signal is input into the clock tree again to distribute the internal clock signal. The result of all four conditions 84A-D 84a-d is that lock 86 occurs with an overall reduction in delay stages, which is the purpose of the circuit while maintaining the desired operating range.

XX

Referring now to Fig. 7, the present invention is shown being used in a delay-locked loop or DLL, which is shown generally by the numeral 200. An external clock signal 216 is input into receiver and buffer 218. This produces clock input signal (CIN) 220. The delay in the signal as it passes through buffer receiver 218 is represented by d<sub>in</sub> 219. CIN signal 220 is input via branch 222 into phase detector 226. CIN signal 220 is also directed via branch 224 into delay line 228. Phase detector 226 may include any associated logical circuitry. The goal of the present invention is to take CIN signal 220 as well as a clock feedback signal 230 (CKFB) and, by defining particular characteristics and relationships about the timing of CIN signal 220 and CKFB 230, to delineate specific conditions under which the signals are operating, and selecting and directing the signals accordingly. Although a specific logic arrangement is shown, it is contemplated that any suitable control logic may be used to define the conditions of the signals and then selecting them accordingly. CKFB feedback signal 230 is a typical feedback loop as is found in a common delayed-lock loop (DLL). Phase detector 226 compares the timing of signal CIN and signal CKFB. Based on timing conditions and characteristics of each signal, control signals are sent via control lines 232 to control block 234 and output via lines 236 to delay line 228. The period of the delay is represented by t<sub>delay</sub> 230. Associated with the delay line 228 is selector 238 which receives an input 240 from the phase detector 226 as well as inputs 242 and 244 representative of the clock CLK and inverted clock signals respectively. Selector 238 selects, based on the input 240 from the phase detector 226, whether to put signal 242 or 244 to input 246 into clock tree driver 248. The period of delay by the driver is represented by t<sub>tree</sub> 250. The output 252 of the clock tree driver 248 is sent to an output buffer 254 which has an input data line 256 and a data output line 258. The delay by the output of data is represented by the parameter d<sub>out</sub> 260. Clock tree driver 248, as part of the delay-locked loop, feeds back into phase

detector 226 via line 230. The delay associated with the  $\frac{IO}{IO}$  model 262 is represented by the parameter  $d_{in}$  and  $d_{out}$ .

Generally speaking,

1. In order to synchronize XCLK with DQs,

$$t_{delay} = t_{ck} - t_{tree} - (d_{in} + d_{out})$$

In traditional DLLs, the delay stages required are:

$$N = \frac{t_{delay}}{t_d} = \frac{t_{ck} - t_{tree} - (d_{in} + d_{out})}{t_d}$$

$$N_{worst} = \frac{t_{ck} (long) - t_{tree} (short) - (d_{in} + d_{out}) (fast)}{t_d (fast)}$$

$$= \frac{15n - 1n}{110 \text{ ps}} \approx 128$$

2. Use the same method, and adding a selector:

$$t_e < t_{ck}/2$$
,  $t_{delay} = t_{ck}/2 - t_e$ 

$$t_e > t_{ck}/2$$
,  $t_{delay} = t_{ck} - t_e$ 

For both cases,

 $t_{delay}$  is less than or equal to  $t_{ck}/2$ 

$$N_{\text{worst}} = \frac{t_{\text{ck}}/2 \text{ (long) - others}}{t_{\text{d}} \text{ (fast)}} = \frac{7.5n - 1n}{110 \text{ ps}} \approx 59$$

Referring now to Fig. 8, a timing diagram for signals CIN and CKFB are shown in a particular arrangement. The period from the rising edge 300 to rising edge 302 is designated as  $t_{ck}$ . The amount of time from rising edge 300 of CIN and rising edge 304 of CKFB is represented by the parameter  $t_e$ . Additionally, the parameter from the rising edge 304 of CKFB and the falling edge 306 of CIN is represented by the parameter  $t_{delay}$ . In this case,  $t_{delay}$  is less than or equal to half of  $t_{ck}$ .

Eng

Referring now to Fig. 9, the second case is illustrated where CKFB does not fire until after the first pulse of CIN. Again,  $t_{ek}$ -is  $\underline{t}_{ck}$  is represented by the rising edge 308 of CIN and the next rising edge 310 of CIN. Additionally, the length of time from the rising edge 308 to the rising edge 312 of CKFB is shown by the parameter  $t_e$ . However, in this instance,  $t_{delay}$ -is  $\underline{t}_{delay}$ 

Fig. 10 is a block diagram of a computer system 100. The computer system 100 utilizes a memory controller 102 in communication with SDRAMs 104 through a bus 105. The memory controller 102 is also in communication with a processor 106 through a bus 107. The processor 106 can perform a plurality of functions based on information and data stored in the SDRAMs 104. One or more input devices 108, such as a keypad or a mouse, are connected to the processor 106 to allow an operator to manually input data, instructions, etc. One or more output devices 110 are provided to display or otherwise output data generated by the processor 106. Examples of output devices include printers and video display units. One or more data storage devices 112 may be coupled to the processor 106 to store data on, or retrieve information from, external storage media. Examples of storage devices 112 and storage media include drives that accept hard and floppy disks, tape cassettes, and CD read only memories.

While the present invention has been described in conjunction with preferred embodiments thereof, many modifications and variations will be apparent to those of ordinary skill in the art. For example, although the present invention is directed to synchronous mirror delay systems, the present invention is contemplated to be used with any implementable logic devices and in other arrangements, such as in a digital delay locked loop (DDLL), to improve the efficiency in that arrangement. The foregoing description and the following claims are intended to cover all such modifications and variations.